A walkthrough of Ernest (2005)’s original analytical approach, from close reading of the paper.
Questions
- Is energy use across body size categories (regardless of species) uniform or multimodal?
- uniform would correspond generally to energetic equivalence/Damuth’s rule.
- multimodal might suggest different resource availability for different body sizes.
- If energy use is not uniform across body size categories, does the species level body size distribution correspond to modes of energy use?
- i.e. are there more species with mean body sizes around the modes of the body size-energy use distribution?
- if so, maybe it’s good to be certain sizes, and species accumulate at those optima.
Data
Ernest data
Ernest drew data from the Andrews LTER, the Sevilleta, Niwot Ridge, and Portal.
The data available online do not quite match the descriptive statistics reported in Ernest (2005).
Translation to replicate-becs
Download raw data. By default data will be stored in subdirectories of replicate-becs/data/paper/raw/ for each site.
download_raw_paper_data()
Process raw data into the appropriate format. This is a data table with a record for each individual and columns for species and weight in grams. By default these tables will be stored in subdirectores of replicate-becs/data/paper/processed.
process_raw_data()
Loading in data version 1.106.0
Load data tables for each community. There should be 9 communities.
communities <- load_paper_data()
length(communities)
[1] 9
Each community should be a data table with columns for species and size for each individual, for example:
names(communities)
[1] "andrews" "niwot" "portal" "sev-5pgrass" "sev-5plarrea"
[6] "sev-goatdraw" "sev-rsgrass" "sev-rslarrea" "sev-two22"
head(communities[[1]])
Constructing distributions/metrics
Body size-energy use distributions (BSED)
Ernest method
- Per individual, calculate metabolic rate as metabolic rate \(B \propto M^\frac{3}{4}\) where \(M\) is mass in grams.
- Sum energy use of all individuals in body size classes of .2 natural log units.
- Also try classes of .1 and .3 natural log units
- Convert raw energy use values for each body size class into the proportion of all the energy used in that community used by that body size class. This allows for comparisons between communities.
Translation to replicate-becs
For every individual, calculate metabolic rate and assign to a size class.
communities_energy <- lapply(communities, FUN = make_community_table, ln_units = 0.2)
head(communities_energy[[1]])
For each community, sum total energy use for each size class, and convert to the proportion of total energy use for that community.
bseds <- lapply(communities_energy, FUN = make_bsed)
head(bseds[[1]])

Species-level body size distributions (BSD)
Ernest method
- Frequency distributions of mean mass of each species in a community.
- For plotting (but not statistics), smoothed using kernel density estimation.
- Gaussian kernel to mimic the actual body size distribution in log space
- avg. std dev of the mean of the logged masses = smoothing parameter \(h\)
- align sampling points with the midpoint of each size class in the BSED
- after Manly 1996, “Are there clumps in body-size distributions?”, Ecology
Translation to replicate-becs
Calculate mean mass of each species in each community.
bsds <- lapply(communities, FUN = make_bsd)
head(bsds[[1]])

Energetic dominance (\(D_E\))
- Define “energy use modes” as contiguous body size classes where the energy use of each size class > 5% of the community total.
- i.e. a little bit more than the expectation if energy use is uniform across all body sizes
- RMD is unsure of this. Doesn’t the uniform expectation depend on the number of size classes?
- Calculate the total energy use for each species in the mode.
- Calculate the “dominance” of the species with the highest energy use in that mode as \(D_E = p_{max}\), where \(p_max\) is the maximum proportion of energy use by any one species in a mode.
- “a modification of the Berger-Parker dominance index (Berger and Parker 1970)”
Translation to replicate-becs
- Find contiguous size classes where each class has >5% of total energy use
- Calculate the total energy use for each species, and the proportion held by the species with the highest energy use (\(p_{max}\))
- Return \(p_{max}\) for every mode, along with the min and max size classes in that mode for each community
energetic_dom <- lapply(communities_energy, FUN = energetic_dominance)
head(energetic_dom[[1]])
- To plot, combine all modes from all communities and plot a histogram of \(D_E\) values.

- Out of curiousity, what happens if we define the modes with the cutoff proportional to the number of size classes (instead of a fixed 5%?)
energetic_dom_prop <- lapply(communities_energy, FUN = energetic_dominance, mode_cutoff = 'prop')

Statistical tests
Compare BSEDs among communities
Ernest approach
- For every pair of communities, create a pool of masses of all individuals from both communities.
- Draw two new communities with the same number of individuals as the empirical communities, pulling masses at random from the pool, with replacement.
- Calculate the DOI for the BSEDs of the two sample communities.
- Repeat 10000 for each pair.
- The P value is the proportion of sample DOIs greater (i.e. less overlap) than the empirical value.
Translation to replicate-becs
- For every pair of communities, pool all the masses
- Resample two communities of the right sizes
- Construct BSEDs for both communities
- Calculate the DOI of the two BSEDs
- Repeat 10000x
community_combination_indices = utils::combn(x = c(1:9), m = 2, simplify = TRUE) %>%
t() %>%
as.data.frame() %>%
dplyr::rename(community_a = V1, community_b = V2)
combine_communities = function(indices, communities) {
community_combination = list(community_a = communities[[indices[1]]], community_b = communities[[indices[2]]], community_names = c(names(communities)[[indices[1]]], names(communities)[[indices[2]]]))
return(community_combination)
}
community_combinations = apply(community_combination_indices, MARGIN = 1, FUN = combine_communities, communities = communities)
bsed_crosscomm_bootstraps = lapply(community_combinations, FUN = community_bootstrap,
bootstrap_function = 'bootstrap_crosscomm_bseds', nbootstraps = 10)


See histogram of p values for comparisons to see if commuities’ BSEDs are the same or different.
---
title: "Narrative of original analysis"
author: "Renata Diaz"
date: "5/14/2019"
output: html_notebook
---

```{r setup, include = F}
library(replicatebecs)
download_data = FALSE
```

A walkthrough of Ernest (2005)'s original analytical approach, from close reading of the paper. 

## Questions

1. Is energy use across body size categories (regardless of species) uniform or multimodal?
  - uniform would correspond generally to energetic equivalence/Damuth's rule.
  - multimodal might suggest different resource availability for different body sizes.
2. If energy use is not uniform across body size categories, does the species level body size distribution correspond to modes of energy use?
  - i.e. are there more species with mean body sizes around the modes of the body size-energy use distribution?
  - if so, maybe it's good to be certain sizes, and species accumulate at those optima.


## Data

#### Ernest data
Ernest drew data from the Andrews LTER, the Sevilleta, Niwot Ridge, and Portal. 

The data available online do not quite match the descriptive statistics reported in Ernest (2005). 

#### Translation to `replicate-becs`

Download raw data. By default data will be stored in subdirectories of `replicate-becs/data/paper/raw/` for each site. 
```{r download data if not downloaded, include = F}
if(download_data) download_raw_paper_data()
```

```{r download raw data, eval = F}
download_raw_paper_data()
```

Process raw data into the appropriate format. This is a data table with a record for each individual and columns for `species` and `weight` in grams. By default these tables will be stored in subdirectores of `replicate-becs/data/paper/processed`. 

```{r process paper data}
process_raw_data()
```

Load data tables for each community. There should be 9 communities.

```{r load community data, echo=TRUE}
communities <- load_paper_data()

length(communities)
```

Each community should be a data table with columns for species and size for each individual, for example:

```{r inspect community data}
names(communities)
head(communities[[1]])
```


## Constructing distributions/metrics

### Body size-energy use distributions (BSED)

#### Ernest method

- Per individual, calculate metabolic rate as metabolic rate $B \propto M^\frac{3}{4}$ where $M$ is mass in grams.
- Sum energy use of all individuals in body size classes of .2 natural log units.
    - Also try classes of .1 and .3 natural log units
- Convert raw energy use values for each body size class into the proportion of all the energy used in that community used by that body size class. This allows for comparisons between communities.

#### Translation to `replicate-becs`

For every individual, calculate metabolic rate and assign to a size class. 

```{r construct BSEDs}
communities_energy <- lapply(communities, FUN = make_community_table, ln_units = 0.2)

head(communities_energy[[1]])
```

For each community, sum total energy use for each size class, and convert to the proportion of total energy use for that community.

```{r make bseds}
bseds <- lapply(communities_energy, FUN = make_bsed)

head(bseds[[1]])
```

```{r plot bseds, echo=FALSE, fig.height=10, fig.width=10}

bseds_plot <- plot_paper_dists(bseds, dist_type = 'bsed')

invisible(bseds_plot)
```

### Species-level body size distributions (BSD)

#### Ernest method
- Frequency distributions of mean mass of each species in a community.
- For plotting (but not statistics), smoothed using kernel density estimation. 
    - Gaussian kernel to mimic the actual body size distribution in log space
    - avg. std dev of the mean of the logged masses = smoothing parameter $h$
    - align sampling points with the midpoint of each size class in the BSED
    - after Manly 1996, "Are there clumps in body-size distributions?", _Ecology_
    
#### Translation to `replicate-becs`

Calculate mean mass of each species in each community. 

```{r construct bsds} 

bsds <- lapply(communities, FUN = make_bsd) 

head(bsds[[1]])
```

```{r plot bsds, echo=FALSE, fig.height=10, fig.width=10}

bsds_plot <- plot_paper_dists(bsds, dist_type = 'bsd')

invisible(bsds_plot)
```


### Energetic dominance ($D_E$)

- Define "energy use modes" as contiguous body size classes where the energy use of each size class > 5% of the community total. 
    - i.e. a little bit more than the expectation if energy use is uniform across all body sizes
    - RMD is unsure of this. Doesn't the uniform expectation depend on the number of size classes?
- Calculate the total energy use for each species in the mode. 
- Calculate the "dominance" of the species with the highest energy use in that mode as $D_E = p_{max}$, where $p_max$ is the maximum proportion of energy use by any one species in a mode. 
    - "a modification of the Berger-Parker dominance index (Berger and Parker 1970)"

#### Translation to `replicate-becs`

- Find contiguous size classes where each class has >5% of total energy use
- Calculate the total energy use for each species, and the proportion held by the species with the highest energy use ($p_{max}$)
- Return $p_{max}$ for every mode, along with the min and max size classes in that mode for each community

```{r energetic dominance}

energetic_dom <- lapply(communities_energy, FUN = energetic_dominance) 

head(energetic_dom[[1]])

```

- To plot, combine all modes from all communities and plot a histogram of $D_E$ values.

```{r plot Ed, echo=FALSE, fig.height=5, fig.width=5}
e_dom_plot <- plot_e_dom(energetic_dom)
e_dom_plot
```


- Out of curiousity, what happens if we define the modes with the cutoff proportional to the number of size classes (instead of a fixed 5%?)

```{r edom proportional}
energetic_dom_prop <- lapply(communities_energy, FUN = energetic_dominance, mode_cutoff = 'prop') 
```

```{r plot edom proportional, echo=FALSE, fig.height=5, fig.width=5}
e_dom_prop_plot <- plot_e_dom(energetic_dom_prop)
e_dom_prop_plot
```

- RMD: They're similar. 

## Statistical tests

### Comparing BSEDs to uniform

#### Ernest approach

- Use bootstrap sampling to compare to uniform distributions.
- For every community, draw 10000 samples (sim communities):
    - Same number of individuals as the empirical community, drawn from a uniform distribution ranging from the smallest to largest ~~body size~~ individual metabolic rate of any individual in that community.
- For sim communities and the empirical community, calculate a distribution overlap index ($DOI$):
    - $DOI = \sum_k {|y_{ak} - y_{bk}|}$ where $y$ is the value for size class $k$ in communities $a$ and $b$.
    - $DOI$ values will range from 0 (complete overlap) to 2 (no overlap). 
    - For the BSED bootstraps, community $a$ is the empirical or sim distribution, and community $b$ is a true uniform distribution ~~(i.e. $y_{bk} = \frac{1}{\max(k)}$ for all $k$)~~
        - "True uniform distribution": There are exactly the same number of individuals of every size. 
- Calculate the $DOI$ for all sim communities and the empirical.
- Find the quantile value for the empirical $DOI$ compared to the distribution of sim $DOI$s. This is the p-value; i.e. the proportion of sim uniform distributions with DOIs greater than the empirical.

#### Translation to `replicate-becs`

- For a given empirical community, draw 10000 sim communities each with the same number of individuals $n$, with body sizes randomly drawn from a uniform distribution from the minimum to maximum body size in that community.
- Calculate the $DOI$ of each sim community compared to a true uniform distribution. 
    - True uniform distribution = every size from the minimum to the maximum size in the community (by .1g) has exactly one individual.

```{r BSED-uniform bootstrapped DOIs}

bsed_uniform_bootstraps <- lapply(communities, FUN = community_bootstrap,  bootstrap_function = 'bootstrap_unif_bsed_doi', nbootstraps = 10)

```


_See issue #4 on github._
    
    
```{r plot BSED-uniform bootstrap DOIs v empirical,echo=FALSE, fig.height=10, fig.width=10} 

bsed_uniform_bootstrap_plot <- plot_paper_dists(bsed_uniform_bootstraps, dist_type = 'bsed_bootstraps')

invisible(bsed_uniform_bootstrap_plot)
```

### Compare BSEDs among communities

#### Ernest approach
- For every pair of communities, create a pool of masses of all individuals from both communities.
- Draw two new communities with the same number of individuals as the empirical communities, pulling masses at random from the pool, with replacement.
- Calculate the DOI for the BSEDs of the two sample communities.
- Repeat 10000 for each pair.
- The P value is the proportion of sample DOIs greater (i.e. less overlap) than the empirical value. 

#### Translation to `replicate-becs`
- For every pair of communities, pool all the masses
- Resample two communities of the right sizes
- Construct BSEDs for both communities
- Calculate the DOI of the two BSEDs
- Repeat 10000x

```{r pairs for crosscommunity BSED comparisons}
community_combination_indices = utils::combn(x = c(1:9), m = 2, simplify = TRUE) %>%
  t() %>%
  as.data.frame() %>%
  dplyr::rename(community_a = V1, community_b = V2)

combine_communities = function(indices, communities) {
  community_combination = list(community_a = communities[[indices[1]]], community_b = communities[[indices[2]]], community_names = c(names(communities)[[indices[1]]], names(communities)[[indices[2]]]))
  
  return(community_combination)
}

community_combinations = apply(community_combination_indices, MARGIN = 1, FUN = combine_communities, communities = communities)

```

```{r cross community BSED comparisons}

bsed_crosscomm_bootstraps = lapply(community_combinations, FUN = community_bootstrap, 
                                   bootstrap_function = 'bootstrap_crosscomm_bseds', nbootstraps = 10)


```

```{r plot cross community comparisons, echo = F, fig.height=30, fig.width=10}

crosscomm_bootstrap_plot = plot_crosscomm_bseds(bsed_crosscomm_bootstraps)

invisible(crosscomm_bootstrap_plot)

```

```{r plot cross community p values, echo = F, fig.height = 5, fig.width = 5}
pvals_histogram = plot_pvals(bsed_crosscomm_bootstraps)

pvals_histogram
```


See histogram of p values for comparisons to see if commuities' BSEDs are the same or different.

### Testing BSDs for uniformity

#### Ernest approach
- $\delta$-corrected Kolmogorov-Smirnov test. 
  - "The $\delta$-corrected K-S test increases the power of the test when sample sizes are small (n < 25; Zar 1999)"
- The $\delta$-corrected test is not widely discussed online. 

#### Translation to `replicate-becs`:

*From Zar (1999) _Biostatistical Analysis_.*

##### Base K-S test
- Take vector of measurements $X_i$. 
- For each $X_i$ record the observed frequency $f_i$ (number of observations with that value).
- Determine cumulative observed frequencies $F_i$ and cumulative relative frequencies $\textrm{rel}F_i$:
    - $\textrm{rel}F_i = \frac{F_i}{n}$ where $n$ is the number of measurements taken. 
    - $\textrm{rel}F_i$ is the proportion of the sample that is measurements $\leq X_i$. 
- For each $X_i$, determine the cumulative *relative* expected frequency from the comparison distribution, $\textrm{rel}\hat{F_i}$.
    - For a uniform distribution, $\textrm{rel}\hat{F_i} = \frac{X_i - \min(X)}{\max(X) - \min(X)}$
- Determine $D_i$ and $D'_i$ as:
    - $D_i = |{\textrm{rel}F_i - \textrm{rel}\hat{F_i}}|$
    - $D'_i = |{\textrm{rel}F_{i-1} - \textrm{rel}\hat{F_i}}|$
        - note $F_0 = 0$ so $D'_1 = \textrm{rel}\hat{F_i}$
- The test statistic $D$ is:
    - $D = \max[(\max(D_i), (\max(D'_i)]$
- Compare to critical values from appendix.

##### $\delta$-corrected KS test

- For small sample sizes (<25) we can obtain increased power using the $\delta$-corrected KS test.
- For each $i$ determine
    - $\textrm{rel}G_i = \frac{F_i}{n + 1}$
    - $\textrm{rel}G'_i = \frac{F_i - 1}{n - 1}$
- Then obtain similar $D$s
    - $D_{0, i} = |\textrm{rel}G_i - \textrm{rel}\hat{F_i}|$
    - $D_{1, i} = |\textrm{rel}G'_i - \textrm{rel}\hat{F_i}|$
- The test statistic is either $\max(D_{0, i})$ or $\max(D_{1, i})$, whichever leads to the highest level of significance/smallest probability. Look up significance in table from appendix. The 1 and 0 are the $\delta$s. 


Tables of critical values were entered by hand from the appendix to Zar (1999). Note that the $\delta$-correction is only included in the seventh edition. 

```{r bsds deltaks to uniform}

bsd_uniform_ks = lapply(bsds, FUN = zar_ks_test, delta_correction = T)

```
